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In a celebrated work by Hoefltding [J. Amer. Statist. Assoc. 58 
(1963) 13-30], several inequalities for tail probabilities of sums M„ — 
Xi + • • • + X„ of bounded independent random variables Xj were 
proved. These inequalities had a considerable impact on the devel- 
opment of probability and statistics, and remained unimproved un- 
til 1995 when Talagrand [Inst. Hautes Etudes Sci. Publ. Math. 81 
(1995a) 73-205] inserted certain missing factors in the bounds of 
two theorems. By similar factors, a third theorem was refined by 
Pinelis [Progress in Probability 43 (1998) 257-314] and refined (and 
extended) by me. In this article, I introduce a new type of inequality. 
Namely, I show that P{M„ > a:} < c¥{Sn > x}, where c is an absolute 
constant and Sn = ei + ■ ■ ■ + En is a sum of independent identically 
distributed Bernoulli random variables (a random variable is called 
Bernoulli if it assumes at most two values). The inequality holds for 
those x G R where the survival function x ^ f{Sn > x} has a jump 
down. For the remaining x the inequality still holds provided that the 
function between the adjacent jump points is interpolated linearly or 
log-linearly. If it is necessary, to estimate ¥{Sn > x} special bounds 
can be used for binomial probabilities. The results extend to martin- 
gales with bounded differences. It is apparent that Theorem 1.1 of 
this article is the most important. The inequalities have applications 
to measure concentration, leading to results of the type where, up to 
an absolute constant, the measure concentration is dominated by the 
concentration in a simplest appropriate model, such results will be 
considered elsewhere. 

1. Introduction and results. To illustrate the flavor of the inequalities 
provided below, let us start with the special case of a sum Zn = Yi-\ \-Yn 
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of bounded independent random variables such that P{0 < Ifc < 1} = 1 and 
EXfc = Pk for ah k. Then 

(1.1) P{^n >x} <eP{ei + ••• + £„ >x}, e = 2.718..., 

for integer x £ Z, where ei,...,e„ are independent identically distributed 
(i.i.d.) Bernoulli random variables that assume values and 1 such that 
P{efc = 1} = p with p = {pi + • • • + Pn)/n. The bound (1.1) is a very special 
case of Theorem 1.2. The following bound (1.2) is independent of n and 
is much rougher than (1.1). Furthermore, usually bounds of type (1.1) are 
more convenient in applications than bounds of type (1.2). We have 

(1.2) p{Z„>x}<yP|7?>^| 

for X such that x/{l—p) is an integer, where is a Poisson random variable 
with parameter A such that 

X = pn/{l-p), F{r] = k} = X'^expi-Xj/kl for A; = 0, 1, 2, . . . . 

The Introduction is organized as follows. First formulations of the results, 
namely, of Theorems 1.1-1.3 are provided. Then their relationships to Ho- 
effding's inequalities are discussed, references are provided and the methods 
are explained. Theorem 1.1 seems to be the most important. It has nice ap- 
plications to the measure concentration; such applications will be addressed 
elsewhere. 

Henceforth replace the independence assumption by a martingale type 
dependence. Let 

be a family of a algebras of a measurable space {Q, J-). Let M„ = Xi -\ h 

Xn be a martingale with differences X^ = — M^-i- Define Mq = 0. 

The simplest thinkable nontrivial martingale is a sum 5^ = ei H h of 

n i.i.d. Bernoulli random variables. A random variable (or its distribution) 
is called Bernoulli if it assumes at most two values with positive probability. 
Let cj > and 6 > 0. By e = e(cj^, 6) denote a Bernoulli random variable such 
that 

(1.3) Ee = 0, Ee2 = o-^ P{e = 6} > 0. 
It is easy to check that 

P{e = -aVfe} = b^/{h^ + a^), P{e = h] = o^l{}? + a^). 

Assuming (one-sided) boundedness of the differences X^^ in this article it is 
shown that up to an absolute constant factor the tail probability P{M„ > rr} 
is dominated by the probability P{5n > x\. The result can be interpreted by 



TAIL PROBABILITIES 



3 



saying that the behavior of tail probabihties of martingales is controlled in a 
very precise way by the simplest possible stochastic experiment — a series of 
eventually asymmetric coin tosses. This is not unexpected due to a common 
belief that Bernoulli random variables are those that are the most stochastic. 
It is less unexpected that one can provide a relatively simple proof of this 
fact. 

For differences of a martingale M„ , consider the following boundedness 
condition: There exists a positive nonrandom b> such that 

(1.4) F{Xk<b} = l for fc = l,...,n. 

For the conditional variances s| = E(X||^fc_i) of differences Xf^ of M^, 
consider the following boundedness condition: There exist nonrandom > 
such that 

(1.5) F{sl<al} = l for A: = l,...,n. 

Theorem 1.1. Assume that the differences Xk of a martingale Mn sat- 
isfy the conditions (1.4) and (1.5). Then, for all x £M, we have 

(1.6) F{Mn >x}< -F°{Sn > x} 

with e^/2 < 3.7, where Sn is a sum of n independent copies of a Bernoulli 
random variable e = e{a'^,b) with = (erf + • • • + cr'^)/n (the meaning o/P° 
is explained below). The inequality (1.6) yields 

(1.7) P{M„>x}<yP°|r?>A + ||, 

where rj is a Poisson random variable with the parameter A = (cjf + • • • + 

The bound (1.7) is much rougher compared with (1.6) because it has to 
cover the case n = oo, which supplies the heaviest tails. In general, tails of a 
sum of independent eventually nonidentically distributed Bernoulli random 
variables can have a complicated structure. 

Let me explain the meaning of P°. Write B[x) = Pj/Sn > x} for the sur- 
vival function of Sn- For x such that B[x) = 1 or B{x) = or when the 
function B has a positive jump down, understand P° just as probabil- 
ity. Let B° be a log-concave hull of i?, that is, a minimal function such 
that B < B° and the function x>-^ —logB°{x) is a convex function. Define 
¥°{Sn > a^} = B°[x). It is easy to see (cf. Lemma 4.1) that in the case of 
the binomial or Poisson survival function B, the function B° is a log-linear 
interpolation of -B: \i x < z <y and x and y are adjacent points where B 
has positive jumps down, then 

(1.8) B°{z) = B^-^{x) B^{y), if z = (1 - A)x + Ay, < A < 1. 
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Similarly I introduce the linear interpolation of B by writing B^{x) = 
B{x), for X such that B{x) = 1 or B{x) = or the function B has a positive 
jump down, and 

B^iz) = {I - X)B{x) + XB{y) for x,y,z,X as in (1.8). 

We have B < B° < 5*. 

For differences of a martingale M„, consider the boundedness condi- 
tion 

(1.9) n-Pk<Xk<l-pk} = l for A; = l,...,n, 
where pk are nonrandom (it is clear that <pk < 1). 

Theorem 1.2. Assume that the differences of a martingale Mn sat- 
isfy the condition (1.9). Then, for x G M, we have 

(1.10) P{M„ > x} < eF°{Sn > x} 

with e < 2.72, where Sn = £i + ■ ■ ■ + £n is a sum of n independent copies of 
a Bernoulli random variable 

e = e{p-p^,l-p) withp={pi-\ \-pn)/n. 

Furthermore, we have 

(1.11) p{M„>x}<yP°|r?>A + ^^| 

with e^ /2 < 10.1, where rj is a Poisson random variable with A =pn/{l — p). 

It is easy to check that the Bernoulli random variable e from Theorem 1.2 
satisfies 

^{e = -p] = l-p, W{e = l-p]=p. 

By an application of (1.10) to —Mn, one can derive bounds for P{M„ < x}. 
For differences Xf^ of a martingale M„ , consider the following boundedness 
conditions: There exist nonrandom 6^ > such that, for k = 1, . . . ,n, 

(1.12) F{Xk < bk} = 1 
and 

(1.13) n\Xk\<bk} = l. 
Write 

(1.14) Ofe = max{6fc,o-fc}, = (a? H ha^)/re, 

where 0"^ are from the condition (1.5). 
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Theorem 1.3. Assume that the differences Xk of a martingale Mn 
satisfy the condition (1.5) and the one-sided houndedness condition (1.12). 
Then, for all x G M, we have 

(1.15) P{M„ > x} < ^r{5„, > x} 

with 2e'^/9 < 4.47, where Sn is a sum of n independent copies of a symmet- 
ric Bernoulli random variable e = e(a^,a) with a? defined by (1.14). The 
inequality (1.15) implies 

(1.16) P{M„>x}<^(^l-cI>(^^^^, 

where ^ is the standard normal distribution function. 

The symmetric Bernoulli random variable from Theorem 1.3 satisfies 
P{e = ±a} = i. 

The following corollary is less general compared to Theorem 1.3. 

Corollary 1.4. Assume that the differences of a martingale Mn 
satisfy the symmetric boundedness condition (1.13). Then, for all x € M, the 

bounds (1.15) and (1.16) of Theorem 1.3 hold, replacing with (^f + h 

bl)/n. 

Theorems 1.1-1.3 show that the martingale type dependence does not in- 
fluence the bounds for tail probabilities much compared to the independent, 
the i.i.d. and even the i.i.d. Bernoulli cases. 

Most probably, the values of constants in Theorems 1.1-1.3 are not opti- 
mal; the preferred intention herein was to simplify the proofs as far as possi- 
ble. A more powerful method that can improve constants and the structure 
of the bounds was used by Bentkus (2001). A bound from Bentkus (2001) 
applies to the special case of Theorem 1.2 when = 1/2, and is precise for 
integer x (a bound which is precise for all x is in preparation). A conse- 
quence is that constants in the bounds (1.6), (1.10) and (1.15) of Theorems 
1.1-1.3 cannot be smaller than 2, and these constants, say c, have to satisfy 

2<c<3.7, 2<c<2.72, 2 < c < 4.47, 

respectively, which means that space for improvement is restricted. In the 
case of Theorem 1.1, the multiplicative factor of losses in (1.6) is at most 
1.85. In contrast to the martingale dependence, finding precise values of these 
constants in the independent and i.i.d. cases is considered a very difficult 
mathematical problem. For a given n, let c„ be the best possible constant 
in Theorem 1.1. An impression that the sequence Cn is increasing as n — > 
oo and that linin—toQ Cn = 2 is supported by the fact that ci = 1.555884. 
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Another supporting heuristic argument comes from the analysis of constants 
in the Berry-Esseen bounds in cases n = l and n = oo in Bentkus and Kirsha 
(1989) and Bentkus (1994), where a similar picture was observed. 

One cannot replace ¥°{Sn > x} in Theorems 1.1-1.3 with P{S'n > x}. This 
taboo clearly follows from the results (approach) of Bentkus (2001). A truly 
simple proof is provided in Section 4 as Lemma 4.8. 

Let us compare Hoeffding's (1963) inequalities with bounds of Theorems 
1.1-1.3. By Theorem 1 in Hoeffding (1963), under the conditions and nota- 
tion of Theorem 1.2, we have 

(1.17) F{Mn>x}<H''{p + x/n;p), 
where, for < p < 1, 

(1.18) H{a;p)=(^^^^ (^^^ for p < a < 1 
and 

H{a;p) = l for a < p; H{a;p)=0 for a > 1. 

Among all inequalities that have a product form, Hoeffding's bounds are 
the best possible; see Lemma 4.7. Naturally, the product structure in our 
bounds is lost. 

Hoeffding's inequalities remained unimproved until 1995 when Talagrand 
(1995a, b) inserted certain missing factors. Assuming independence and un- 
der the conditions of Theorem 1.2, Talagrand's bound is as follows: There 
exists an absolute constant c > such that 

(1.19) P{M„>x}< + -Ah'^\P+--.p] forc<x<— , 

\o + X J \ n J c 

where 6^ = np(l — p). The right-hand side of (1-19) is simplified up to an ab- 
solute factor. This is a nonessential loss because Talagrand's bound depends 
on an inexplicit absolute constant. 

I have a feeling that more or less explicit analytical functions do not truly 
follow the behavior of ¥{Mn > x} correctly. 

The loss in Theorem 1.2 is at most the factor e/2 < 1.36. Up to an abso- 
lute constant. Theorem 1.2 (and Theorems 1.1 and 1.3 as well) says that the 
tail probability is maximized in the case of the simplest possible stochastic 
model, namely, in the case of a series of eventually asymmetric Bernoulli 
trials. To estimate F{Sn > x} one can use special bounds for the binomial 
probabilities [see, e.g., Shorack and Wellner (1986)], and in the view of The- 
orems 1.1-1.3, these special bounds are not so special at all. 

Let us move on to Hoeffding's Theorem 3. To simplify notation (and with- 
out loss of generality) we assume that the number b in Theorem 1.1 satisfies 
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6 = 1. Assuming independence and under the conditions and notation of 
Theorem 1.1, Hoeffding proved that 

(1.20) p{m.>x}</7-(4^;Y^). 

The simplest Hoeffding bound (1.17) is imphed by (1.20) by using rescahng 
and choosing the maximal possible variance a"^ = p — for distributions 
supported by the interval [—p, 1 — p]- Assuming, in addition, that \X}J\ < B, 
Talagrand (1995b) improved (1.20): There exists an absolute constant c > 
such that 

l + (j2 'iT^J 

for < X < na'^/{cB). Talagrand noticed that it is unclear how to im- 
prove (1.20) without assumptions like \Xk\ < B. The inequality (1.21) nicely 
improves (1.20) when the variance is not too small, that is, in cases of Gaus- 
sian type behavior. To see this better, assume for simplicity that B = \. 
Then, in the case of = 1, the factor in (1.21) is on the order of ~ -y/n/x, 
in the range \/ri <C x <^n. However, for degenerating cr^ — > (i.e., when the 
behavior is of Poisson type), the range starts to shrink. To be definite, take 
0"^ = 1/n. Then the factor is ~ 1 and x has to satisfy x ^ 1. Notice that in 
such cases Theorem 1.1 still provides nice upper bounds in the whole range 
X < n of interest. 

Theorem 1.3 extends and refines Hoeffding's Theorem 2. The bound (1.16) 
with a somewhat worse constant is contained in Bentkus (2003). Using an- 
other approach, in Bentkus (2004) a bound similar to (1.16) was proved 
under the asymmetric boundedness condition 

(1.22) P{4 - ak < Xk <dk + flfc} = 1, 

where dk = dk{Xi, . . . ,Xk-i) are arbitrary .7-"fc_i-measurable random vari- 
ables. This bound applies to the measure concentration. It is unclear whether 
one can extend and refine Hoeffding's Theorem 2 under the condition (1.22) 
using the methods of this article. Pinelis (1998) proved (1.16) under the 
symmetric boundedness condition (1.13). Earlier [see Pinelis (1999), The- 
orem 5], the bound (1.15) under the symmetric boundedness condition of 
Corollary 1.4 was established by Pinelis, assuming independence, for integer 
X such that x G n + 2Z and |x| < n. 

Hoeffding's Theorem 2 had a considerable impact on research related to 
the measure concentration phenomena. For an introduction to the topic, 
see Gromov and Milman (1983), Alon and Milman (1984), Milman (1985, 
1988), Milman and Schechtman (1986), McDiarmid (1989), Talagrand (1995a) 
and Ledoux (1999). 



(1.21) nMn>x}<(^^ + ^)H"( 
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For statistical applications, optimal bounds for finite (i.e., fixed) n are of 
interest [see Bentkus and van Zuijlen (2003)]. In this sense, the results herein 
are not optimal and hopefully can be improved by extending the methods 
of Bentkus (2001, 2004, b). However, the extensions involve considerable 
technical difficulties. 

The history of inequalities for tail probabilities is a very rich classical 
topic [see, e.g., books Petrov (1975) and Shorack and Wellner (1986)]. The 
names Chernoff, Bennett, Prokhorov and Hoeffding come to mind. For x > 
constant, the bounds above refine all the classical bounds. Indeed, one can 
estimate the binomial probability ^{Sn > x} using these bounds. 

On methods. Hoeffding (1963) applied the Chebyshev inequality to re- 
place an indicator function of an interval with an exponential function, which 
can be interpreted as a kind of Fourier-Laplace transform. The further Ho- 
effding proof is precise; hence such a method cannot be used to improve the 
bounds. Talagrand (1995b) started with the Esscher transform, which is re- 
lated to exponential functions. The proof in the articles by Pinelis [following 
Eaton (1970, 1974)] starts similarly to that of Hoeffding, but he used the 
functions x ^ max{0; (x — t)^} instead of exponentials, with some t G M and 
p G Z. In this article, we start in the same way. A nice and short argument 
in the proof of Theorem 1.2, which allows derivation of the inequality (3.9) 
from the bound (3.5), is extracted from an article by Pinelis (1999); see 
Lemma 4.2 below. The argument reduces the proof of Theorem 1.2 to the 
verification of inequalities (3.2) and (3.3). The scheme of the proof of The- 
orems 1.1 and 1.3 is similar, replacing (3.2) and (3.3) by appropriate coun- 
terparts. It seems that methods used here do not allow improvement of the 
constants and the structure of the bounds. In the aforementioned articles the 
methods rely on induction on n. Potentially such induction based methods 
can provide optimal bounds and it seems as well that they are more robust 
against generalizations. 

2. Some supplements, improvements and extensions. In this section I 
provide some well-known upper bounds for Poisson and normal survival 
functions, a bit more complicated versions of bounds of Theorems 1.1-1.3 
and precise bounds in the case n = 1. 

A standard rather rough upper bound for a Poisson survival function is 

P{r/ > A + x} < exp{a; - (x + A)log(l + x/\)} for x > 0. 

For larger x, an impression about Poisson tails can provide the following 
inequalities [see Proposition 3 in Paulauskas (2002)]: There exist absolute 
positive constants ci and C2 such that 

cig(x) < f{rj > A + x} < C2g{x) for x > max{A — 1, 1}, 
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where 

g{x) = (A + x)-^^\l + x/A)^^+^>-i exp{x - (x + A) log(l + x/X)} 

and where {A + x} is the fractional part of A + x. 

A commonly used upper bound for the standard normal tail is 

l-^{x)<ip{x)/x, ip{x) = {2Tr)~^/^exp{-x^/2}, x>0. 

Let us pass to the extensions of Theorems 1.1-1.3. The extensions are 
more convenient in applications because they do not require checking of log- 
concavity. Hence, one can manipulate the bounds, for example, by applying 
limit theorems, and check log-concavity at the final stage of the application. 
I provide as well a direct generalization and extension of Hoeffding's Theo- 
rem 3 to martingales. This extension can be useful in cases where checking 
log-concavity is not available or in cases when very precise bounds are not 
needed. It is interesting to notice that in contrast to the much more subtle 
and powerful Theorem 3, there exist lots of extensions, improvements and 
generalizations of Hoeffding's Theorem 2. Probably the reason is that The- 
orem 2 is simpler than Theorem 3, because instead of variances, it involves 
only rather rough size parameters. 

For differences of a martingale M„ , consider the following boundedness 
condition: There exist positive nonrandom 6fc > and cr^ > such that 

(2.1) F{Xk<bk} = l, F{sl<al} = l 

for k = 1, . . . ,n, where s| = E(X||.77s_i) are the conditional variances of Xf^. 

Introduce independent Bernoulli random variables 9k = 9k{<j1,bk) such 
that 

(2.2) E^fc = 0, Bel = al F{6k = bk}>0 

[cf. the definition (1.3) of the Bernoulli random variable e = e{cr,b)]. Write 

(2.3) Tn = 9l + --- + en. 

Theorem 2.1. Assume that the differences X^ of a martingale Mn sat- 
isfy (2.1). Then, for h> 0, we have 

(2.4) P{M„ > x} < exp{-/ix}Eexp{/ir„,} 
for all x If all b^ are equal, bk = b, then 

(2.5) P{M„ > x} < inf exp{-/ix}Eexp{/i5„} = H 

h>0 

Here Sn is a sum of n independent copies of a Bernoulli random variable 
£ = e{a'^,b) with the variance = (af + • • • + o''^)/n and the function H is 
given by (1.18). 



a + bx/n a 



1)2 ^^2 ' 52 _^ ^2 
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We provide proofs in Section 3. A number of upper bounds for the function 
H are provided in Hoeffding (1963). 
Let = max{0,j;} and = (x+)'^. 

Theorem 2.2. Write f{x) = {x — t)^, where s>2, and assume that 
the differences Xk of a martingale Mn satisfy (2.1). Then, for all t < x and 
X gM, we have 

(2.6) P{M„ >x}< E/(T„)/(x - ty 
and 

(2.7) P{M„ >x}< e's-T(s + l)P°{r„ > x}, 

where x i— > P°{T„ > x} is a log-concave hull of the survival function x i— > 
P{7n ^ x} and r is the gamma function. 

In the next proposition we provide precise bounds for n = 1 under the 
conditions of Theorems 1.1-1.3. 

Proposition 2.3 (Case n = 1). Assume that a random variable satisfies 
EX = 0. Let a<0 <b and a>0. 

(i) Let B[x) = supP{X > x}, where sup is taken over all random vari- 
ables X such that P{a < X <b} = 1. Then B{x) = p with p = —a/[x — a) 
for <x <b. 

(ii) Write B{x) = sup¥{X > x}, where sup is taken over all X such that 

r{X < 6} = 1 and EX'^ < a'^. 

Then B{x) = p with p = j (x^ + o"^) for <x <b. 

In both cases (i) and (ii) we have B{x) = 1 for x < and B{x) = for 
X > b. 

3. Proofs. Write x+ =max{0,x} and x^ = (x+)*. Let us start with the 
proof of Theorem 1.2 because it is simpler compared to the proof of Theo- 
rem 1.1. 

Proof of Theorem 1.2. Let us prove first the bound (1.10). In the 
proof we assume that —pn < x <n — pn because for other values of x the 
inequality (1.10) reduces either to 1 < e or to < 0, which is obvious. 

Write f{z) = {z — t)+, where t G M is a parameter to be chosen later. 
Notice that I{n > x} < f{u)/{x — t) for t < x, where I{A} is the indicator 
function of event A. Using the Chebyshev inequahty, we have 



(3.1) 



P{M„ >x} <E/(M„)/(x-t) iovt<x. 
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Applying Lemma 4.3, we have 

(3.2) E/(M„)<E/(T„), 

where r„ = + • • • + is a sum of independent BernouUi random variables 
such that 

^{^k = -Pk} = '^-Pk and P{4 = I - pj^} = pf^. 

We are going to replace the eventually non-i.i.d. Bernoulli random vari- 
ables ^fc with the i.i.d. Bernoulli random variables from the condition of the 
theorem. If / is a convex function, then 

(3.3) E/(r„) < E/(ei + ... + £„)= Ef{Sn), 

with Sn from the condition of the theorem. Hoeffding [(1956), Theorem 3] 
proved (3.3) for strictly convex / and Gleser [(1975), Corollary 2.1] ex- 
tended (3.3) to convex /. One can easily check (3.3) using the Schur concav- 
ity; see the proof of Lemma 4.5 for a definition of Schur concave functions. 
In the specific case of f{x) = {x — t)+ we have 

{Z - t) dF{Sn > 4 = > Z} dz. 

Combining (3.1)-(3.4), we obtain 

1 r°° 

(3.5) F{Mn >x}< inf / F{Sn > z) dz. 

t<x X — t Jt 

To estimate the right-hand side of (3.5), we can apply Lemma 4.2 with 

a = —pn, (3 = n — pn, s=l and B{z) =F{Sn> z}. 

We get F{Mn > x} < eF°{Sn > z}, which concludes the proof of (1.10). 

Let us prove prove (1.11). The sum Sn is a sum of n independent copies 
of a Bernoulli random variable e = £{p — p'^, 1—p). Applying the bound (1.7) 
of Theorem 1.1, we obtain 

(3.6) p{s„>4<^p°|^>A + ^|, 

where r/ is a Poisson random variable with X = pn/{1 — p). Combining in- 
equalities (3.5) and (3.6), we have 

1 r°° ( z ^ 

(3.7) P{M„>x}<— inf / P°<^??>A + }dz, 

2 t<x X — t Jt L 1 — P J 

and an application of Lemma 4.2 yields (1.11). □ 

Proof of Theorem 2.1. Let us prove the bound (2.4). Using the 
Chebyshev inequality, we have 

F{Mn >x}< exp{-/i2;}E exp{/iAf„} for h > 0. 
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By Lemma 4.4, we have Eexp{/iM„} <Eexp{^T„}, which concludes the 
proof of (2.4). 

Let us prove (2.5). The inequality 

exp{— /ix}Eexp{/iT„} < exp{— /ix}Eexp{/iS'„} 

is proved inHoeffding [(1963), (4.22) in the proof of Theorem 3]. The equality 
in (2.5) is just the definition of the Hoeffding function; see Hoeffding (1963). 

□ 

Proof of Theorem 2.2. Let us prove (2.6). Write f{z) = (z - t)^. 
Using the Chebyshev inequality, we have P{M„ >x}< E/(M„)/(x — ty for 
t < X. By Lemma 4.4, we can estimate E/(M„) < E/(r„), and (2.6) follows. 
The bound (2.7) is implied by Lemma 4.2. □ 

Proof of Theorem 1.1. Let us prove (1.6). Without loss of generality 
(rescaling if necessary), we can assume that the number b from the condi- 
tion (1.4) satisfies 6 = 1. In the proof we assume that —na"^ < x <n, because 
for —na"^ < x or x> n the inequality (1.6) reduces to obvious 1 < e^/2 or 
< 0, respectively. 

Write f{z) = {z — t)'^, where t G M is a parameter to be chosen later. Using 
the Chebyshev inequality, we have 

(3.8) ¥{Mn>x}<Bf{Mn)/ix-tf fort<x. 

By Lemma 4.4, we can replace M„ with a sum of Bernoulli random vari- 
ables, that is, 

(3.9) E/(M„)<E/(r„), 

where T„ = + • • • + is a sum of independent eventually nonidentically 
distributed Bernoulli random variables 9k = 9{a1,l) [cf. (2.2)]. 

By Lemma 4.5, we can replace the non-i.i.d. with the i.i.d. Bernoulli ran- 
dom variables, 

(3.10) E/(r„) < E/(ei + . . . + e„) = E/(5„), 

where Sn = £i + ■ ■ ■ + £n is a sum of i.i.d. Bernoulli random variables Sk = 

£^(£7^,1). 

In the specific case of f{x) = {x — t)'^, we have 

/CO rco 
{z - tf <m{Sn >z} = 2j {z- t)'¥{Sn > z} dz. 

Combining (3.8)-(3.11), we obtain 

(3.12) P{M„ >x]< inf — / {z - t)¥{Sn > z] dz. 

t<x [x — ty Jt 
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To estimate the right-hand side of (3.12), we can apply Lemma 4.2 with 

a = — nu^, P = n, s = 2 and B{z) =F{Sn> z}. 

We get F{Mn > x} < (eV2)P°{5„ > z}, proving (1.6). 

It remains to prove (1.7). Introduce the martingale Kn+m = + • • • + 
Yn+rn with the differences 

Yfc = Xfc, for /c = 1, . . . , n and 

(3.13) 

Yj~ = 0, for k = n+l,. . . ,n + m. 

To the martingale Kn+m we can apply the bound (1.6) of Theorem 1.1. We 
get 

(3.14) P{M„ > X} = F{Kn+m >X}< -¥°{Sn+m > x} , 

where Sn+m is a sum of n + m independent copies of a Bernoulli random 
variable 

e = e{a'^,b) with a'^ = {af ^ h a'^)/{n + m) = }?\/{n + m). 

Centering and rescaling, we get 

(3.15) nSn+m >x] = nZn+m > (A + x/h)/{l + A/(n + m))}, 

where Zn+m is a sum of n + m independent copies of a Bernoulli random 
variable, say ^, such that 

P{^ = 0}=g withg = l-p and p = = 1} = A/(n + m + A). 

To the sum Zyi-\-m we can apply the Poisson limit theorem because p{n + ?ti) — > A 
as m — > CO. We get 

(3.16) lim P{Z„+„ > (A + x/b)/{l + A/(n + m))} = P{7] > A + 

m — ^oo 

Combining (3.14)-(3.16), we conclude the proof of (1.7). □ 

Proof of Theorem 1.3. Let us prove (1.15). Write f{z) = (z - 
Similar to (3.1) and (3.8), we have 

(3.17) P{M„ >x}< Bf{Mn)/{x - tf for t < x. 

Let Tn. = Oi + ■ ■ ■ + 9nhe a sum of independent Bernoulli random variables 
such that P{^fc = — flfc} = P{^fc = CLk} = 1/2. An application of Lemma 4.6 
yields Bf{Mn) < Bf{Tn). The inequality 

(3.18) Bf{Tn) < E/(ei + ... + £„) = E/(5„), 
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where Sn = £i + ■ ■ ■ + £n is a sum of n independent copies of a symmetric 
Bernoulli random variable e = e(a^,a) as established in Eaton (1970, 1974) 
and Pinelis (1994). 

Using f{x) = (x — t)^, integrating by parts and combining (3.17) and 
(3.18), we have 



Now an application of Lemma 4.2 implies (1.15). 

It remains to prove (1.16). Introduce the martingale Kn+m = 5^1 + • • • + 
Yn-\-m with the differences defined by (3.13). To the martingale Kn+m we 
can apply the bound (1.15) of Theorem 1.3. We get 



(3.19) P{M„ > x} = F{Kn+m >x}< — P° (n + m)~^/^Sn+m > - 



where Sn+m is a sum of n + m independent copies of a symmetric Bernoulli 
random variable, say e, such that ¥{e = —1} = P{e = 1} = 1/2. We conclude 
the proof of (1-16) by passing to the limit in (3.19) as m — > co and using the 
central limit theorem. □ 

Proof of Corollary 1.4. The boundedness condition (1.13) guaran- 
tees that the conditional variances s| are bounded from above by Hence, 
Bfc = 6fc and we can apply (1.15) and (1-16) with = 6f + • • • + 6^. □ 

Proof of Proposition 2.3. It suffices to prove (i) and (u) only for 
< X < 6. Indeed, for x > 6 we have obviously B{x) = 0. For x < 0, the upper 
bound B{x) < 1 is obvious; the lower bound -B(x) > 1 follows by considering 
the random variable X = 0. 

(i) For < X < 6, the linear function u{t) = (1 — p)t + p satisfies I{t > 
x} < u{t) for all t from the interval [a,b]. Therefore, we have 



The lower bound B{x) >p is realized by a Bernoulli random variable, say 
X = e, such that F{e = a} = I - p and P{e = b}=p. 

(ii) For < X < 5 and t <b, the quadratic function u{t) = (1 — p)^x^(t + 
(T^/x)^ satisfies the inequality I{t > x} < u{t). Similar to (3.20) it follows 
that B{x) < p. The lower bound B{x) >p is realized by a Bernoulli random 
variable, say X = e, such that F{e = —a^/b} = I — p and P{e = b} =p. □ 






(3.20) 



F{X >x} <EuiX) =p and B{x)<p. 
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4. Auxiliary results. A function / : A — > [0, oo) defined on a subset Ac R 
is called log-concave if the function x i— > — log /(x) is convex. Whereas / can 
assume the value 0, the function — log/ can assume the value oo. Call a 
random variable X discrete if there exists a countable set A such that 

¥{X eA} = l and ¥{X = x} > for aU x e A. 

A survival function x ¥{X > x} is called discrete if X is discrete. A bino- 
mial survival function x i— > PjS'n > x} is discrete and is not log-concave as a 
function defined on M. However, it is log-concave as a function defined on the 
set A of points at which it has positive jumps down (see Lemma 4.1). There- 
fore, a discrete survival function is called log-concave if it is log-concave as 
a function defined on the set A (hopefully this terminology will not lead to 
misunderstanding). For a function /:M— > [0, oo), introduce its log-concave 
hull /° : R — > [0, oo) as a minimal log-concave function such that f < f°. Any 
survival function has a unique log-concave hull which is again a log-concave 
survival function. 

For a random variable X, which assumes integer values, the probability 
mass function is defined as Pn = ¥{X = n} for n G Z. In the literature, dis- 
tributions with log-concave densities and probability mass functions are ref- 
ered to as strong unimodal in the sense of Ibragimov [cf. Keilson and Gerber 
(1971) and Ibragimov (1956)]. We are interested in log-concave survival func- 
tions, which have a weaker requirement compared to the strong unimodality. 
The next lemma is just a reexposition of some facts from Keilson and Gerber 
(1971) and Pinelis (1998, 1999). 

Lemma 4.1. (i) Let n I — > and n i — > be loc/- concave functions such 
that Pn,Qn^O- Then the convolution 

oo 

{P*q)n= X! Pn-kQk 
k=—oo 

is a log- concave function. 

(ii) Let n Pn be a log- concave function such that Pn^O. Then the 
function n^tn with tn = J2k>nPk log-concave function. 

(iii) Bernoulli random variables have log-concave probability mass func- 
tions. Binomial survival functions are log-concave (as discrete ones). 

(iv) Let Bj, be a sequence of log-concave survival functions which have 
probability mass functions supported byZ. Then the pointwise limit lim,fc_>oo -B^- 
is a log-concave function. 

(v) Poisson survival functions are log-concave (as discrete ones). 

(vi) Binomial and Poisson survival functions B satisfy B < B° < B^. In 
both cases B° is just a log-linear interpolation of B. 
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In general, it is not true that a sum of two independent discrete random 
variables with log-concave survival functions has a log-concave discrete sur- 
vival function. Indeed, let £,£1,62 be i.i.d. Bernoulli random variables such 
that 

W{e = 0} = q and F{e=l}=p withp + q = l. 

Then the discrete survival function B{x) = ¥{ei + a£2 > x} is not log- 
concave provided that the numbers a > and p> are sufficiently small. 
Indeed, assume that the function B is log-concave. The random variable 
£1 + a£2 assumes values 0<a<l<l + a. The log-concavity of B yields 
B{0)^~"-B{1)"- < B{a), which is equivalent to p"" < 2p — p^ . Passing to the 
limit as a | 0, we have 1 < 2p — p^, which is impossible if p > is sufficiently 
small. A similar consideration shows that survival functions of discrete infi- 
nite divisible random variables are not necessarily log-concave: for example, 
the survival function of r] + a^, where r/ and ^ are Poisson random variables 
with parameters A > and 7 > is not log-concave provided that a > and 
7 > are sufficiently small (just consider the values of the survival function 
at points 0,o and 2a). 

Proof of Lemma 4.1. (i) Write 

= * 9)n - (P * 9)n-l * q)n+l ■ 

We have to prove that 5 > 0. It is easy to check that 26 = J2'k'r=-oo ^(^ with 

a = PkPr - Pk+lPr-l and {3 = qn-kq-n-r - qn-k-iqn-r+1 ■ 

If /c > r, then a > and /? > 0, because both functions pn and n>-^ qn 
are log-concave. If /c < r, then q < and /? < 0, which concludes the proof 
of 5 > 0. 

(ii) Notice that t„ = {p*q)n, where g„ = < 0} is log-concave function, 
and apply (i). 

(iii) It is clear that Bernoulli probability mass functions are log-concave. 
Hence, by applying (i), binomial probability mass functions are log-concave. 
Therefore, (ii) guarantees that binomial survival functions are log-concave. 

(iv) Obvious. 

(v) A Poisson survival function is a limit of a sequence of Bernoulli sur- 
vival functions. Therefore we can apply (iii) and (iv). 

(vi) The inequality B < B° is obvious. The inequality B° < B'^ is equiv- 
alent to the elementary inequality 

a^~^b^<a + b for a > 6 > and 0<A<1. 

To see that 18° is a log-linear interpolation of i?, it suffices to compare the 
graphs of — log B and — log B° . □ 
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In the case of a log-concave B, the next lemma was proved by Pinelis 
(1999). Actually, the work by Pinelis contains more general results. In the 
special case s = 1, the result was established by Bretagnolle (1980) and by 
Kemperman [see Shorack and Wellner (1986), Chapter 25, Lemma 1]. 

Lemma 4.2. Let s > 0. Let B be a survival function with a log-concave 
hull B°. Let 

a = sup{y:B°{y) = l} and /? = inf{y : 5°(y) = 0}. 
Then, for x such that a < x < (3 , we have 

(4.1) infix -ty s(z-ty~^B(z)dz<e's-'T(s + l)B°(x), 

Jt 

where r(s) = /q°° r^~^ exp{— r} dr. 

Proof. Because it is short, the proof is provided. The function z i-^ 
— logB°(z) is a convex function. It is clear that this function is strictly 
positive and strictly increasing in the interval (a,/3]. Hence, for each x £ 
{a, (3], there exists a linear function, say y{z) = a + bz, with some positive 
6 > 0, such that 

y{x) = -logM°{x) and - logM° (z) >y{z) for ah z e R. 

The numbers a = a{x,B) and b = b{x,B) can depend on x and B. In par- 
ticular, we have 

(4.2) ]B°(x) =exp{-a- te}, n°{z) < exp{-a - bz} for ah z G M. 
Using B <B° and (4.2), we have 

roo poo 

/ s{z-tY~'^B{z)dz<exp{-a} / s{z - ty~^ exp{-bz} dz 
Jt Jt 

(4.3) =r{s + l)b-'exp{-a-bt} 

= r{s + 1)6"" exp{6(x - t)}M°{x). 
Using (4.3) and choosing t such that b{x — t) = s, we obtain (4.1). □ 

It seems that the next lemma has to be a well-known fact [a useful related 
reference is Karlin and Studden (1966)]. We write ^ = ^(a, b) for a Bernoulli 
random variable such that 



(4.4) F{^ = a} = b/{b-a) and F{^ = b} = 



—a/{b — a). 
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Lemma 4.3. (i) Let >M be a convex function. Assume that a ran- 

dom variable X satisfies 

EX = 0, ¥{a<X<h] = l, a<0<b. 

Then Ef{X) < E/(^), where is a Bernoulli random variable satisfying 

(4.4) . 

(ii) Let a function f : — > M 6e a convex function of each of variables 
xi,...,Xn when the remaining n — 1 variables are kept fixed. Assume that 
the differences of a martingale M„ = Xi + ■ ■ ■ + Xn satisfy 

nak<Xk<bk} = l, 

where numbers flfc < < 6^ are nonrandom for all k. Let = ^^(0^,6^) be 
independent Bernoulli random variables. Then we have 

(4.5) E/(Xi,...,X„)<E/(6,...,en). 

Proof, (i) We have to prove that E/(X) < E/(^). Let u : [a, 6] ^ M be a 
hnear function. Then Eu(X) = Eu(,^) because EX = E^ = 0. Choose u such 
that u{a) = f{a) and u{b) = f{b). Then f <u because / is convex. Futher- 
more, Eu(0 = E/(0 because G {a, 6}} = 1. Hence E/(X) < Eti(X) = 
Em(^) = E/(^), which concludes the proof in the case (i). 

(ii) We use induction in n. In the case n = 1, the result was proved in (i). 
Let n > 1 and let (4.5) hold for 1, . . . ,n — 1. Notice that for given Xi, the 
sequence 

(4.6) Zo = 0, Zi=X2,...,Zn-l=X2 + ---+Xn 

is a martingale sequence with differences that satisfy 

(4.7) ¥{ak+i<Zk-Zk-i<bk+i} = l for A; = 1, . . . , n - L 

Conditioning on Xi and applying the induction assumption twice (for n — 
1 and 1), we have 

E/(Xi, . . . , X„) = E(/(Xi, . . . , X„)|Xi) 

<E(/(Xi,6,...,en,)|^i) 

= E(/(Xi,6,...,en,)ie2,...,en,) 

<E(/(ei,e2,...,Ul6,---,U 

= E/(ei,...,u, 

which completes the proof of (4.5) for 77, > 1. □ 
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Lemma 4.4. Let f be one of the functions 

fix) = {x-t)l, tEM; 
fix) = {x-t)l, s>2; 
/(x) = exp{/ix}, h>0. 

(i) Assume that a random variable X satisfies 

¥{X<b} = l, EX2<o-^ 

Then E/(X) < 'Ef{9), where a Bernoulli random variable satisfies 9 = 
9{a'^,b) [see the definition (2.2) of 9]. 

(ii) Assume that a martingale satisfies condition (2.1), that is, that 
Xk < bk and s| < with probability 1, where s| are the conditional vari- 
ances of the differences X^ ■ Let = 9i + ■ ■ ■ + 9n be a sum of indepen- 
dent Bernoulli random variables 9f^ = 0A:(Cfc; ^fc)- Then we have Ef(Mn) < 

^f{Tn). 

Proof. It suffices to prove the lemma with fix) = {x — i)^, t € M. In- 
deed, both functions g{x) = {x — t)^ and g{x) = exp{/ix} with s > 2 and 
/i > allow the integral representation 

(4.8) g{x) = 1 / g"'{u){x - u)l du, g'" > 0. 

Jr 

Therefore, the inequality E(M„, — u)^ < E(T„ — n)^ for ah n G M clearly 
implies Eg{Mn)<EgiTn). 

Henceforth let f{x) = {x — t)^. 

(i) Let us prove that E/(X) < Ef{9). The r.v. X satisfies ¥{X < 6} = 1. 
We consider the following cases separately: 

(a) t < -a^/b; 

(b) -a^/b<t<b; 

(c) t>b. 

Case (a). Using 

ix-t)l<{x-tf and EX = 0, EX^ < ct^ 

we have 

Ef{X) < E{X - tf <a^ + t^ = E{9 - tf = E{9 - t)\ = Ef{9). 
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Case (b). Notice that 

{x - t)l < c{x + a'^/bf, for x<b, where c = b^{b- tf/ib"^ + a'^f. 
Using this inequahty and EX = 0, EX^ < c^, wc obtain 

Bf{X) < cE(X + ^2)2 < c{a^ + a^/b^) = B{e - t)l = Bf{9). 

Case (c). Now we have E/(X) = E/(0) = and there is nothing to 
prove. 

The proof of (i) is completed. 

(ii) Using induction in n, we shah show that (i) yields (ii). For n = 1, the 
asertion (ii) is equivalent to (i). Assume that (ii) hold for 1, ... ,n — 1. Let 
us prove (ii) for n. Notice that for given Xi, the sequence 

Zo = 0, Zi = X2, ■ . ■ , Zn-i = X2-\ + Xn 

is a martingale sequence with differences satisfying 

^{Zk - Zk^i < bk+i} = 1, E((Zfc - Zfc_i)2|Zi, . . . , Zfc„i) < al_^_i 

for k = 1, . . . ,n — l. Conditioning on Xi and applying the induction assump- 
tion twice (for n — 1 and 1), we have 

E/(M„) = E(/(Xi + • • • + Xn)\Xi) 

<B{f{Xi + 02 + --- + 9„)\Xi) 

= B{f{Xi+62 + ---+en)\92,...,9n) 

< E(/(^i + 02 + • • • + 0„)\e2, ...,9n) = Bf{Tn), 

which completes the proof of (ii) and of the lemma. □ 

Lemma 4.5. Let 

(4.9) xi,...,Xn>0, a = (xiH hx„)/n. 

Let Tn=9i-\ \-0n be a sum of independent (eventually non-i.i.d.) Bernoulli 

random variables 9k = 9k{xk, 1). Let Sn = hen be a sum ofn indepen- 
dent copies of a Bernoulli random variable e = e(a, 1). Let f{x) = {x — i)^. 
Then, for any t gM, we have 

(4.10) E/(r„)<E/(5„). 
Proof. Write 

(4.11) qk = F{9k = -Xk} = l/{l+Xk), pk=F{ek = l} = Xk/{l + Xk) 
and notice that P{e = -a} = 1/(1 + a) and P{e = 1} = a/(l + a). 
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We use well known properties of Schur convex functions [see Marshall and Olkin 
(1979)] . Recall that a vector a; = (xi , . . . , G M" majorizes y = (i/i , . . . , G 
M" (we use the notation x>* y) if 

Xn:n-\ h Xk : n > Vn: n -\ \-yk:n for all A; = 1, . . . , n, 

where Xn:n> • • • > xi:n is a decreasing rearragement of the sequence xi, . . . ,Xn- 
Notice that x >* y{x) for any x € M'^, where the vector y{x) = (a, . . . , a) has 
equal coordinates such that a = {xi + • — h Xn)/n. 

A real valued function g defined on an open subset C C M"^ is called Schur 
concave if x >* y implies g{x) < g{y). Assuming that g has continuous partial 
derivatives such that 

(4.12) djg — dig>0, when Xi> Xj, 

where dj = d/dxj, a result of Schur [see Schur (1923) and Ostrowski (1952)] 
says that g is Schur concave in cases when the set C is a symetric open convex 
set and 5 is a symmetric function of its arguments. Notice that the result of 
Schur still holds if the set C instead of the symmetry assumption satisfies: 
there exists a z = (6, . . . , 6) G M'^ such that the set C — z is symmetric. Indeed, 
the majorization and (4.12) are preserved by a shift transformation of this 
kind. 

Write g{x) = g{xi, . . . , Xn) = E/(r„). Due to the result of Schur, to prove 
the inequality (4.10) it suffices to check that the function 5 is a Schur concave 
function Notice, that as C we can choose a sufficiently large open cube which, 
for a given a, contains the set 



Because the cube is open, we have to allow x^ to assume negative values. We 
assume that x^ > —1/3. Now the probabilities defined by (4.11) can be nega- 
tive, and in such cases we understand E?/;(6'fc) as Ew^O;^) = w{—Xk)qk + w{l)pk 

Due to the symmetry of g in its arguments, it suffices to check the con- 
dition (4.10) with j = 1 and i = 2. The inequality has to hold for all t G R. 
Therefore, conditioning on 9^,..., On, it is easy to see that we can assume 
that 



{x : xi + ■ ■ ■ + Xn = an, xi, . . . , a;„ > 0}. 



(4.13) 



5(x) = E/(0i + 02). 



To simplify notation write xi= a and X2 = (3. Then 



a] = 1/(1 -Fa) 



pi=P{0i = l} = a/(l + a) 



and 



52 = P{02 = -/?} = 1/(1 + /3), P2 = = 1} = /?/(l + /?), 
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and we have to check that dag — dpg > assuming that /? > a, where da = 
d/da. For the function g from (4.13) we have 

(4.14) g = f{-a - (3)qiq2 + /(I - (3)piq2 + /(I - a)qip2 + f{2)pip2. 
We consider the fohowing five cases separately: 

(i) t<-a- (3; 

(ii) -a- P<t<l- p; 

(iii) l-P<t<l-a; 

(iv) 1 - a < t < 2; 

(v) t>2. 

In the proof of (i)-(v) we write I{t) = dag — dpg- The function t I[t) is a 
continuous function. We have to show that I{t) > 0. 

Case (i). In this case f{x) = (x — t)^ on the support of 9i + 02 and, 
therefore, 

(4.15) g = -E{9i+e2-tf = a + f3 + t'^ 
and the inequahty I{t) > is just the equahty = 0. 

Case (ii). Now [cf. (4.14)] 

g = {l-(3- tfpiq2 + (1 - a - tfqiP2 + (2 - tfpiP2. 
Adding and subtracting {—a — P — t)'^qiq2 and using (4.15), we have 

g = a + P + t'^-{a + P + tfqiq2. 
Using daqi = —q\ and dpqi = 0, it is easy to find that 

I(t) = ^a + P + tfqlql{l3-a)>Q, 
which concludes the proof of case (ii). 

Case (iii). We have [cf. (4.14)] 

g = (t + a- lfqiP2 + {t- 2fpip2. 
Using daPi = qf and df^pi = 0, it is easy to see that 

I{t) = 2{t + a- l)qiP2 -{t + a- lfqlp2 + {t - 2fqlp2 
-{t + a-lfqiql-{t-2fpiql 

The function I{t) = At^ + Bt + C is a quadratic function of t with some 
A, B and C. It is clear from (4.16) that A = — QiqI ~ ViQ^ ~ ~Q2- This 
means that the function 1 1— > I{t) : [1 — /3, 1 — a] — > M is a concave function. 
Hence, I{t) > will follow if we check the inequality at the endpoints of the 
interval. However, the inequality 1(1 — /?) > is already established in (ii). 
The inequality /(I — a) > is proved in case (iv). 
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Case (iv). In this case g = {t — 2)^pip2 and 

lit) = it- 2)\qlp2 -piql) = it- 2)^qiq2iPqi - aga). 

Hence, it suffices to check that j3qi — aq2 > 0, which is equivalent to (/3 — 
a)(l + /? + a) > 0, which is obvious. 

Case (v). Now g = and there is nothing to prove. The proof of the 
lemma is completed. □ 

Lemma 4.6. Let f be one of the following functions: 
fix) = ix-t)l, tGR- 
fix) = ix-t)X, s>2; 
/(x) = exp{/ix}, h>0. 

(i) Assume that a random variable X satisfies 

(4.17) F{X<b} = l, EX'^<a^. 

Then we have E/(X) < E/(0), where 9 is a symmetric Bernoulli random 
variable = 0(a^,a) with a = max{a,b} . 

(ii) Assume that a martingale M„ satisfies (2.1), that is, < b^ and 
sf. < <7fc with probability 1, where s| are the conditional variances of the 
differences Xk- Let T„ = + • • • + 0„ be a sum of independent symmet- 
ric Bernoulli random variables 6k = 6kia\,ak) with Ok = maxjufc, bk}- Then 
E/(M„)<E/(r„). 

Proof. Similar to the proof of Lemma 4.4, it suffices to establish (i). 
Assume first that a < 6. By (i) of Lemma 4.4, we have E/(X) < E/(0o), 
where a Bernoulli random variable 9q = 6q (cr^ , b) . The condition a <b implies 

(4.18) ¥{00 <b} = l, EO^ < h^. 

In the view of (4.18) we can estimate the expectation E/(0o) using (i) of 
Lemma 4.4. We get E/(0o) ^ E/(0) with a symmetric Bernoulli random 
variable 6 = 0(a^, a) because a = maxjo", 6} = 5, due to the assumption a <b. 
Combining the inequalities, we obtain the desired E/(X) < E/(^). 

Assume now that a >b. A. random variables which satisfies (4.17), sat- 
isfies as well P{X < o"} = 1 and EX^ < o"^, and we can again apply (i) of 
Lemma 4.4, because now a = maxjcr, 6} = o". □ 



Lemma 4.7. Assume that for a function Qia^a"^) the bound 
(4.19) P{M„>an}<Q"(a;a2) 
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holds for all n and all sums M„, = ei + • • • + e„ of i.i.d. Bernoulli random 
variables Ek =£^('7^)1) so that the conditions of Hoeffding^s Theorem 3 are 
fulfilled. Then we have 

Q{a;a'^)>H{aq + p;p), where p = a'^ /{I + a'^), q=l—p. 

Proof. We have F{ek = -fx^} = q, ¥{ek = 1} = p and 

F{Mn > an} = P{6'i H h 6'„ > z} with z = aq + p and 9k = qSk + p. 

The random variables 9^ are i.i.d. Bernouhi random variables such that 
P{6»fc = 0} = g and ¥{9^ = l}=p. The inequality (4.19) implies 

log Q(a; a^) > i log P{ 01 + ••• + 9ne > z}. 

n 

Passing to the limit as n ^ oo and using a well-known result on large devi- 
ations [see Bahadur (1971), Example 1.2], we get 

\ogQ{a-a^)>-f 

with / = z\og{z/p) + {1 — z) log((l — z)/(l — p)), for p < z < 1. Using the 
explicit formula (1-18) for if, it is clear that exp{— /} = H(a]a'^), which 
proves Q > H and the lemma. □ 

Lemma 4.8. In Theorems 1.1-1.3, P{5„ > x} cannot replace F°{Sn > 
x}. 

Proof. It suffices to prove the lemma in the case n = 1. Let X be 
a random variable such that F{X < 1} = 1, EX = and EX^ < a^. Let 
e = e((T^, 1) be a Bernoulli random variable. To prove the lemma it suffices 
to check that 

(4.20) sup F{X > 0} /¥{e > 0} = oo. 

Taking X = we have P{X > 0} = 1. Using P{e > 0} = a^/{l + cr^), we see 
that (4.20) is implied by the obvious sup^2>Q (l + C72)/(j2 = 00. □ 
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